REDAC: Distributed, Asynchronous Redundancy in Shared Memory Servers

نویسندگان

Brian T. Gold

Babak Falsafi

James C. Hoe

Ken Mai

چکیده

The emergence of multi-core architectures—driven by continued technology scaling—has led to concerns about increasing softand hard-error rates in commodity designs. Because modern chip designs consist of multiple high-speed clock domains, conventional lockstepped redundant execution is no longer practical. Recent work suggests an asynchronous approach to redundant execution, where processor pairs independently execute an instruction stream and treat any differences like soft errors, invoking rollback recovery. Because prior designs buffer instruction results within the out-of-order instruction window, they are limited to tightly coupled redundancy within a single chip, which limits availability and serviceability in the presence of hard errors. We propose REDAC, a set of lightweight mechanisms for distributed, asynchronous redundancy within a sharedmemory multiprocessor. REDAC provides scalable buffering for unchecked state updates, permitting the distribution of redundant execution across multiple nodes of a scalable shared-memory server. The REDAC mechanisms achieve high performance by enabling speculation across common serializing instructions and mitigating the effects of input incoherence. We evaluate REDAC using cycle-accurate fullsystem simulation of common enterprise workloads and show that performance overheads average just 10% when compared to a non-redundant system. These results are comparable to the performance of a similarly configured lockstep design, but offer the substantial benefits of asynchronous redundancy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shared Distributed Memory : the Workspace Model

Shared Distributed Memory Systems offer uniform access to data which are distributed on servers. The Workspace model is a model of shared distributed memory. It is based on communicating processes which are both clients and servers. It enables to implement hierarchical views of data, to enhance security and it adapts to heterogenous networks.

متن کامل

Simulating a Shared Register in a System that Never Stops Changing

Simulating a shared register can mask the intricacies of designing algorithms for asynchronous message-passing systems subject to crash failures, since it allows them to run algorithms designed for the simpler shared-memory model. Typically such simulations replicate the value of the register in multiple servers and require readers and writers to communicate with a majority of servers. The succ...

متن کامل

Storage-Efficient Shared Memory Emulation

Improvements in communication fabrics have enabled access to ever larger pools of data with decreasing access latencies, bringing large-scale memory fabrics closer to feasibility. However, with an increase in scale come new challenges. Since more systems are aggregated, maintaining a certain level of reliability requires increasing the storage redundancy, typically via data replication. The cor...

متن کامل

A Turn Function Scheme Realized in the Asynchronous Single-Writer/Multi-reader Shared Memory Model

We consider a set of users wishing to receive a service in an asynchronous distributed system. Such users declare their wishes and then wait to gain admittance to be served. Except for the initial transient period, at least one user must be waiting to be served, and the system should be as fair as possible for users. A procedure that ensures such a situation is called a turn function. It can be...

متن کامل

Distributed Symbolic Computation with DTS

We describe the design and implementation of the Distributed Threads System (DTS), a programming environment for the paralleliza-tion of irregular and highly data-dependent algorithms. DTS extends the support for fork/join parallel programming from shared memory threads to a distributed memory environment. It is currently implemented on top of PVM, adding an asynchronous RPC abstraction and tur...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

REDAC: Distributed, Asynchronous Redundancy in Shared Memory Servers

نویسندگان

چکیده

منابع مشابه

Shared Distributed Memory : the Workspace Model

Simulating a Shared Register in a System that Never Stops Changing

Storage-Efficient Shared Memory Emulation

A Turn Function Scheme Realized in the Asynchronous Single-Writer/Multi-reader Shared Memory Model

Distributed Symbolic Computation with DTS

عنوان ژورنال:

اشتراک گذاری